Efficient and Effective Analysis of Data Quality using Pattern Tableaux
نویسندگان
چکیده
Data Auditor is a system for analyzing data quality via exploring data semantics. Given a user-supplied constraint, such as a functional dependency or an inclusion dependency, the system computes pattern tableaux, which are concise summaries of subsets of the data that satisfy (or fail) the constraint. The engine of Data Auditor is an efficient algorithm for finding these patterns, which defers expensive computation on patterns until needed during search, thereby pruning wasted effort. We demonstrate the utility of our approach on a variety of data as well as the performance gain from employing this algorithm.
منابع مشابه
Data Auditor: Exploring Data Quality and Semantics using Pattern Tableaux
We present Data Auditor, a tool for exploring data quality and data semantics. Given a rule or an integrity constraint and a target relation, Data Auditor computes pattern tableaux, which concisely summarize subsets of the relation that (mostly) satisfy or (mostly) fail the constraint. This paper describes 1) the architecture and user interface of Data Auditor, 2) the supported constraints for ...
متن کاملDiscovering Pattern Tableaux for Data Quality Analysis: a Case Study
In this paper, we present a case study that illustrates the utility of pattern tableau discovery for data quality analysis. Given a usersupplied integrity constraint, such as a boolean predicate expected to be satisfied by every tuple, a functional dependency, or an inclusion dependency, a pattern tableau is a concise summary of subsets of the data that satisfy or fail the constraint. We descri...
متن کاملSelecting Energy Efficient Poultry Egg Producers: A Fuzzy Data Envelopment Analysis Approach
This study examined the energy use pattern of poultry for egg production farms of Iran and ranked the selected farmers using fuzzy data envelopment analysis (FDEA) from the viewpoint of energy efficiency. Since data used in our study were not measured precisely, fuzzy forms of them could help us to reach the ideal situations. Hence, the conventional data envelopment analysis (DEA) was remod...
متن کاملDesigning an Optimal Pattern of General Medical Course Curriculum: an Effective Step in Enhancing How to Learn
Introduction: In today's world with a vast amount of information and knowledge, medical students should learn how to become effective physicians. Therefore, the competencies required for lifelong learning in the curriculum must be considered. The purpose of this study was to present a desirable general medical curriculum with emphasis on lifelong learning. Methods: The present study was Mixe...
متن کاملA General Approach to Mining Quality Pattern-Based Clusters from Microarray Data
Pattern-based clustering has broad applications in microarray data analysis, customer segmentation, e-business data analysis, etc. However, pattern-based clustering often returns a large number of highlyoverlapping clusters, which makes it hard for users to identify interesting patterns from the mining results. Moreover, there lacks of a general model for pattern-based clustering. Different kin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IEEE Data Eng. Bull.
دوره 34 شماره
صفحات -
تاریخ انتشار 2011